Skip to content

Add current_version_number to dataset status CLI output#965

Closed
kaggle-agent wants to merge 1 commit intomainfrom
agent/bovard-20260410182517-e17b1b2b
Closed

Add current_version_number to dataset status CLI output#965
kaggle-agent wants to merge 1 commit intomainfrom
agent/bovard-20260410182517-e17b1b2b

Conversation

@kaggle-agent
Copy link
Copy Markdown
Collaborator

Users polling kaggle datasets status after uploading large datasets
could only see the processing status but not which version was active.
Now the command also fetches dataset metadata via get_dataset() and
displays the version number alongside the status, e.g.,
"ready (version 3)".

Co-authored-by: kaggle-agent kaggle-agent@users.noreply.github.com


Task: bovard-20260410182517-e17b1b2b
Context: https://chat.kaggle.net/kaggle/pl/jh5x1ft3hjnntgtfwaco11nfmh

Users polling `kaggle datasets status` after uploading large datasets
could only see the processing status but not which version was active.
Now the command also fetches dataset metadata via `get_dataset()` and
displays the version number alongside the status, e.g.,
"ready (version 3)".

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
dataset_response = kaggle.datasets.dataset_api_client.get_dataset(dataset_request)
current_version_number = dataset_response.current_version_number

return status, current_version_number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosbo This breaks backward-compatibility, doesn't it? Not for the CLI usage, but for notebooks that import the package. When I checked, there were tens of thousands of such notebooks, but I don't know how often they are used. And I don't know if any of them call dataset_status().

In general, breaking back-compat is not advised.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had marvin try to one-shot these, but lack enough context on the working of this code to know if it was a good solution. Based on my initial review it didn't look good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Steve. To avoid breaking script that parses this output, we could add a --format flag like gcloud: https://screenshot.googleplex.com/ganNtm4RhbYHWkS

By default, we would keep the text output with only the status.

But user could specify --format=json and they would get the status AND the current version number in a json output.

Optionally, we could support field selection like gcloud in the format flag:

# If you only want a subset of fields
--format='json(current_version_number)`

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — reverted. dataset_status() now keeps its original str return type so notebooks/scripts importing the package are unaffected. The version-number behavior moved behind a new --format flag at the CLI layer per @rosbo's suggestion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the gcloud-style --format flag:

  • Default: unchanged text output containing only the status string (back-compat preserved for both CLI users and notebooks importing the package).
  • --format=json: emits {"status": ..., "current_version_number": ...}.
  • --format='json(current_version_number)' and --format='json(status,current_version_number)' for gcloud-style field selection.

dataset_status() is back to returning just str. The extra get_dataset() call to fetch the version number only runs when --format requests JSON, so we don't pay for it on the default path either. Tests in tests/test_dataset_status.py cover the default path, JSON, field selection, and error cases.

@bovard
Copy link
Copy Markdown
Member

bovard commented Apr 24, 2026

Fixees are in #972

@bovard bovard closed this Apr 24, 2026
@bovard bovard deleted the agent/bovard-20260410182517-e17b1b2b branch April 24, 2026 14:46
@kaggle-agent
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #972, which addresses the back-compat concerns raised by @stevemessick and implements the gcloud-style --format flag suggested by @rosbo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants